A Unified Convex Surrogate for the Schatten-p Norm
نویسندگان
چکیده
The Schatten-p norm (0 < p < 1) has been widely used to replace the nuclear norm for better approximating the rank function. However, existing methods are either 1) not scalable for large scale problems due to relying on singular value decomposition (SVD) in every iteration, or 2) specific to some p values, e.g., 1/2, and 2/3. In this paper, we show that for any p, p1, and p2 > 0 satisfying 1/p = 1/p1 + 1/p2, there is an equivalence between the Schatten-p norm of one matrix and the Schatten-p1 and the Schatten-p2 norms of its two factor matrices. We further extend the equivalence to multiple factor matrices and show that all the factor norms can be convex and smooth for any p > 0. In contrast, the original Schatten-p norm for 0 < p < 1 is non-convex and non-smooth. As an example we conduct experiments on matrix completion. To utilize the convexity of the factor matrix norms, we adopt the accelerated proximal alternating linearized minimization algorithm and establish its sequence convergence. Experiments on both synthetic and real datasets exhibit its superior performance over the state-of-the-art methods. Its speed is also highly competitive. Introduction In recent years, low rank matrix minimization has found wide applications, e.g., matrix completion (Candès and Recht, 2009), low rank representation (Liu, Lin, and Yu, 2010), multi-task learning (Dudik, Harchaoui, and Malick, 2012), etc. Often, we can formulate the problem as follows: min X F (X) = min X f(X) + λΩ(X), (1) where f(·) : Rm×n → R is the loss function, Ω(·) : Rm×n → R is the spectral regularization (Abernethy et al., 2009) which ensures low rankness, and λ ∈ R balances the two terms. As the tightest convex envelop of rank function on the unit ball of matrix operator norm, the nuclear norm regularizer is often suggested for Ω(X) (Recht, Fazel, and Parrilo, 2010; Candès and Tao, 2010). In fact, the nuclear norm is the `1-norm on the vector of singular values. It achieves low rankness by encouraging sparseness on the singular values. ∗Corresponding author. Copyright c © 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. As Fan and Li (2001) pointed out, the `1-norm is a loose approximation to the `0-norm and overpenalizes large entries of vectors. By an analogy between the rank function of matrices and the `0-norm of vectors, the nuclear norm also overpenalizes large singular values. As a tighter approximation to the rank function, the Schatten-p quasi-norm (0 < p < 1) is suggested to replace the nuclear norm (Nie, Huang, and Ding, 2012). For the task of matrix completion, the Schattenp quasi-norm has empirically shown to be superior to the nuclear norm. Moreover, Zhang, Huang, and Zhang (2013) theoretically prove that for the matrix completion problem the Schatten-p quasi-norm with a small p requires much fewer observed entries than the nuclear norm minimization does. However, the Schatten-p quasi-norm is non-convex and non-smooth. So the optimization for problem (1) is much more challenging. Recently, Lai, Xu, and Yin (2013) propose iterative reweighted least square (IRucLp) to solve a smoothed subproblem by approximating the Schatten-p quasinorm at each iteration. They prove that any limit point of the generated sequence is a stationary point. Moreover, Lu et al. (2014) propose the iterative reweighted nuclear norm (IRNN) algorithm. Besides the Schatten-p quasi-norm, IRNN is able to tackle a variety of regularizations, e.g., MCP (Zhang, 2010) and SCAD (Fan and Li, 2001), on the singular values. A similar convergence result as IRcuLq is also established. However, both of the algorithms involve computing SVD at each iteration, which is expensive for large-scale problems. Alternative to (1), the bilinear factorization with two factor matrix norm regularizers is suggested (Srebro, Rennie, and Jaakkola, 2004; Cabral et al., 2013; Shang, Liu, and Cheng, 2016a): min U,V F (U, V ) = min U,V f(UV T ) + λ (Ωu(U) + Ωv(V )) , (2) where U ∈ Rm×d and V ∈ Rn×d are the unknown factor matrices. Quite often, d min{m,n} holds. When minimizing (2), one only needs to operate on two much smaller factor matrices in contrast to the full dimensional X as (1). Thus (2) is better suited for large-size applications. As Srebro, Rennie, and Jaakkola (2004) indicated, when Ωu(U) + Ωv(V ) = ‖U‖F /2 + ‖V ‖F /2, it can be equivalently represented as the surrogate of Ω(X) = ‖X‖∗ when enforcing X = UV T . Let r∗ denote the rank of the optimal X∗ in (1), Mazumder, Hastie, and Tibshirani (2010) proved ar X iv :1 61 1. 08 37 2v 1 [ st at .M L ] 2 5 N ov 2 01 6 that the minimum objective function values of (1) and (2) are equal once d ≥ r∗. Quite recently, Shang, Liu, and Cheng (2016a,b) extended the surrogate of the nuclear norm regularizer Ω(X) to that of specific Schatten-p norms, where p = 1/3, 1/2, or 2/3. They proposed to use the proximal alternating linearized minimization (PALM) algorithm and established its sequence convergence. Motivated by these results, we further extend the surrogate to the general Schattenp norm. The contributions of this paper are as follows: (a) We show that for any p, p1, and p2 > 0 satisfying 1/p = 1/p1 + 1/p2, there is an equivalence between the Schatten-p norm of X and the Schatten-p1 and the Schatten-p2 norms of U and V when enforcing X = UV T (See Theorem 1). The existing surrogates for p = 1, 1/2, and 2/3 are only special cases of ours. We also give an entirely different and much simpler proof than the existing ones. (b) We extend the above result to multi-factor matrices (See Corollary 1) and show that each factor matrix norm of the surrogate can be convex and smooth for any p > 0. In contrast, the Schatten-p norm (0 < p < 1) is nonsmooth and non-convex, and the results of Shang, Liu, and Cheng (2016a,b) are only limited to two or threefactor cases which all involve the non-smooth nuclear norm. (c) We unify the minimization of (1) and (2) for general Schatten-p norm regularizers, where the former is reformulated to the latter (See Theorem 2). We also show that the factorization formulation should be preferred when 0 < p < 1. (d) We conduct experiments on matrix completion as an example to test our framework. By incorporating the convexity of the factor matrix norms, our accelerated proximal alternating algorithm achieves state-of-the-art performance. We also prove its sequence convergence. Notations and Background Consider the SVD of a matrix X ∈ Rm×n: X = UXdiag (σi(X))V T X , where σi(X) denotes its i-th singular value in descending order. Then the Schatten-p norm (0 < p <∞) of X is defined as ‖X‖Sp , min{m,n} ∑
منابع مشابه
Scalable Algorithms for Tractable Schatten Quasi-Norm Minimization
The Schatten-p quasi-norm (0<p<1) is usually used to replace the standard nuclear norm in order to approximate the rank function more accurately. However, existing Schattenp quasi-norm minimization algorithms involve singular value decomposition (SVD) or eigenvalue decomposition (EVD) in each iteration, and thus may become very slow and impractical for large-scale problems. In this paper, we fi...
متن کاملUnified Scalable Equivalent Formulations for Schatten Quasi-Norms
The Schatten quasi-norm can be used to bridge the gap between the nuclear norm and rank function. However, most existing algorithms are too slow or even impractical for large-scale problems, due to the singular value decomposition (SVD) or eigenvalue decomposition (EVD) of the whole matrix in each iteration. In this paper, we rigorously prove that for any 0< p≤ 1, p1, p2 > 0 satisfying 1/p= 1/p...
متن کاملImproved dynamic MRI reconstruction by exploiting sparsity and rank-deficiency.
In this paper we address the problem of dynamic MRI reconstruction from partially sampled K-space data. Our work is motivated by previous studies in this area that proposed exploiting the spatiotemporal correlation of the dynamic MRI sequence by posing the reconstruction problem as a least squares minimization regularized by sparsity and low-rank penalties. Ideally the sparsity and low-rank pen...
متن کاملVisual Processing by a Unified Schatten-p Norm and ℓq Norm Regularized Principal Component Pursuit
In this paper, we propose a non-convex formulation to recover the authentic structure from the corrupted real data. Typically, the specific structure is assumed to be low rank, which holds for a wide range of data, such as images and videos. Meanwhile, the corruption is assumed to be sparse. In the literature, such a problem is known as Robust Principal Component Analysis (RPCA), which usually ...
متن کاملWeighted Schatten p-Norm Minimization for Image Denoising and Background Subtraction
Low rank matrix approximation (LRMA), which aims to recover the underlying low rank matrix from its degraded observation, has a wide range of applications in computer vision. The latest LRMA methods resort to using the nuclear norm minimization (NNM) as a convex relaxation of the nonconvex rank minimization. However, NNM tends to over-shrink the rank components and treats the different rank com...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017